A Distributed P2P Link Analysis Based Ranking System
نویسندگان
چکیده
Link Based approaches are among the most popular ranking approaches employed by search engines. They make use of the inherent linkage based structure of World Wide Web documents assigning each document an importance score. This importance score is based on the incoming links for a document; a document which is pointed to by many high quality documents should have a higher importance score. Googles’ highly popular search technology [1] exemplifies the success of the link based ranking algorithms in identifying important pages. However, such link analysis based algorithms suffer from some drawbacks. Googles’ PageRank algorithm has an update time extending into months which is not feasible for frequent updating of the system. Secondly, the algorithm is susceptible to manipulation by malicious Web Spammers, who manipulate the link based analysis to favor their fake websites. The problem commonly termed as Web Spam can seriously hurts the performance of PageRank algorithm, leading the algorithm into providing unjustified high PageRank to spam web pages. In [2], the authors propose the SourceRank approach for enhancing PageRank through source-based link analysis, which can potentially help combat the problem of web spam. In this project, we propose to implement the SourceRank technique on top a P2P crawler Apoidea [3]. We plan to perform experimental analysis using the SourceRank technique on the www.gatech.edu domain and analyse the results to verify the claims made in [2]. In addition, we analyze the results obtained by the above experimentation to enable us to answer an important question raised in [2] – How can we identify a collection of pages which constitutes a single source?
منابع مشابه
DisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملAggregation of a Term Vocabulary for P2P-IR: A DHT Stress Test
There has been an increasing research interest in developing full-text retrieval based on peer-to-peer (P2P) technology. So far, these research efforts have largely concentrated on efficiently distributing an index. However, ranking of the results retrieved from the index is a crucial part in information retrieval. To determine the relevance of a document to a query, ranking algorithms use coll...
متن کاملDistributed Page Ranking in Structured P2P Networks
This paper discusses the techniques of performing distributed page ranking on top of structured peer-to-peer networks. Distributed page ranking are needed because the size of the web grows at a remarkable speed and centralized page ranking is not scalable. Open System PageRank is presented in this paper based on the traditional PageRank used by Google. We then propose some distributed page rank...
متن کاملEfficiently Handling Dynamics in Distributed Link Based Authority Analysis
Link based authority analysis is an important tool for ranking resources in social networks and other graphs. Previous work have presented JP , a decentralized algorithm for computing PageRank scores. The algorithm is designed to work in distributed systems, such as peer-to-peer (P2P) networks. However, the dynamics of the P2P networks, one if its main characteristics, is currently not handled ...
متن کامل